Bioinformatics (Thomas Dandekar, Meik Kunz)

252

Clustering (Cluster Analysis) Statistical procedure to classify (group) objects into

groups (clusters) with similar characteristic structures (characteristics). A distinction is

made between supervised (groups known) and unsupervised clustering (groups unknown).

Code Specification for the unambiguous representation or assignment of characters with

the aid of a given character sequence (e.g. genetic code using base triplets to represent the

20 amino acids).

COGs (clusters of orthologous genes) see last universal common ancestor.

Computers are data processing machines. To this end, they now typically consist of

hardware (electronic switches, transistors, integrated chips) and other parts (input and

output devices, housings, etc.). They process instructions (software) in sequence to gener

ate new results from the data, e.g. calculations, sequences, result lists or networks (typical

results in bioinformatics calculations).

Consensus Sequence Conserved sequence of motifs in a multiple alignment of several

sequences, such as nucleotides of an enzyme (see also PSSM).

Corona virus see Pandemic.

Databases Different databases (software component) integrate and collect biological

data and make it available to the general public over the Internet using a serviceable com

puter (hardware component called a “server”). Databases hold all the data that people look

up. Typically, this is done in many records. Different properties about a particular record

are held in individual data fields. How this looks in detail is determined by the data model.

Finally, the data can be searched using a query (database query). A simple query language

popular in bioinformatics for simple, smaller databases is the “Structured Query

Language” (in short: “SQL”), and such a database is then an SQL database. Important

bioinformatics databases are listed many times in the book, e.g. GenBank (genome and

nucleotide sequence data) and UniProt/Swiss-Prot for protein sequences.

Data-Driven Modeling Normalization of the different units of the bioinformatic model

according to the experimental data, i.e. the typical times of the signaling cascade, receptor

excitation, phosphorylation of kinases, etc. are determined by this.

Dimension Reduction see Principal Component Analysis (PCA). DNA (deoxyribo

nucleic acid, DNA for short)

Biochemically, a mixture of nucleotides that are all connected via a deoxy-ribose sugar

and a phosphate “backbone” to form a long molecule, the DNA single strand.

Bioinformatically centrally important because DNA contains all the genetic material

(hereditary material, also called the genome) and thus all the hereditary information of an

organism. The DNA single strand pairs on its own with its counterpart strand, so that DNA

18 Glossary